Drug repurposing based on the DTD-GNN graph neural network: revealing the relationships among drugs, targets and diseases

Motivation The rational modelling of the relationship among drugs, targets and diseases is crucial for drug retargeting. While significant progress has been made in studying binary relationships, further research is needed to deepen our understanding of ternary relationships. The application of graph neural networks in drug retargeting is increasing, but further research is needed to determine the appropriate modelling method for ternary relationships and how to capture their complex multi-feature structure. Results The aim of this study was to construct relationships among drug, targets and diseases. To represent the complex relationships among these entities, we used a heterogeneous graph structure. Additionally, we propose a DTD-GNN model that combines graph convolutional networks and graph attention networks to learn feature representations and association information, facilitating a more thorough exploration of the relationships. The experimental results demonstrate that the DTD-GNN model outperforms other graph neural network models in terms of AUC, Precision, and F1-score. The study has important implications for gaining a comprehensive understanding of the relationships between drugs and diseases, as well as for further research and application in exploring the mechanisms of drug-disease interactions. The study reveals these relationships, providing possibilities for innovative therapeutic strategies in medicine.


Introduction
In practical scenarios, many drugs exhibit diverse effects and mechanisms of action, impacting multiple biological processes.This suggests that a drug's therapeutic potential may extend beyond its known applications, potentially offering benefits in treating other previously unexplored diseases.This concept underlies the practice of drug repurposing, also referred to as drug reuse.Drug repositioning involves the repurposing of an approved drug, originally intended to treat a specific disease, to treat another disease.This process typically refers to drugs that have already undergone clinical trials and received regulatory approval for their safety and efficacy [1].
In drug retargeting research, the relationship among drugs, targets, and diseases is essential.Binary and ternary relationship models are used to explore these connections.Binary relationships examine pairwise or self-relationships among drugs, targets, and diseases, while ternary relationships involve all three entities.
Numerous studies focus on drug repositioning through binary relationships.Wang et al. [2] introduce CGINet networks, which integrate chemical-gene interactions to reveal drug effects on specific genes.The analysis of protein-protein interaction (PPI) networks is crucial for uncovering drug-disease associations and offers a promising therapeutic strategy by modulating PPIs [3].Zeng et al. [4] develop a graph neural network model for predicting drug-target associations by integrating multiple features.Cao et al. [5] classify drug-target pairs based on binding affinity using a chemical genomics framework and random forest.Li et al. [6] propose a comprehensive disease-gene association model based on parallel graph transformation networks.These studies contribute to drug repositioning research by exploring various approaches and methodologies.Zhao et al. [7] introduce a novel modality-aware MDA prediction model, MotifMDA, which is capable of achieving highly accurate MDA prediction through the fusion of high-order and low-order structural information.The integration of base-order level structural information enables MotifMDA to identify new MDAs from diverse perspectives.
While these studies have contributed to the field of drug repositioning, they have limitations.They mainly focus on the relationship between drugs and a single target or disease, but ignore the complex ternary relationship among drugs, targets, and diseases.
In addition to the traditional drug-disease and drugtarget binary relationships, recent studies have shown the importance of triple drug-target-disease interactions in the human metabolic system [8][9][10].Some methods have been proposed to construct ternary relationships, such as tensor factorisation to infer missing drug-target-disease tensor entries [8,10,11], Qu [12] et al. develop the concept of event graphs and use nodal prediction methods to study drug repurposing.
The research presented above proposes a method for constructing ternary relationships.However, it is acknowledged that these methods may have limitations.For instance, tensor decomposition may not be appropriate for handling large-scale and sparse data sets, and the performance of the event graph model may be restricted by the complex characteristics of nodes when predicting nodes.Therefore, further refinement and optimization may be required to enhance its accuracy and scalability.
GNN has made significant progress in several areas.For instance, the new two-channel hypergraphic convolutional network (HGHDA) model [52] enables the encoding of higher-order relationships between drugs, their constituents and diseases, in order to derive predictions with scoring functions.However, GNN still faces several challenges in drug repositioning.One of the main challenges is effectively linking the relationship among drugs, targets, and diseases so that GNN can learn their interactions.Furthermore, the relationship among drugs, targets, and diseases is complex, requiring GNN to capture and understand multi-level interactions.This involves the interaction between drugs and targets, targets and diseases, and drugs and diseases.
To overcome these shortcomings, a ternary relationship approach to drug repositioning research is presented.By exploring the interactions among drugs, targets and diseases, the underlying mechanisms of drug repurposing can be revealed.Our study creates event nodes that represent the possible relationships among drugs, targets, and diseases.We then establish a map of isomerized event-disease relationships to capture the connections among drugs, targets, and diseases.In addition, a Drug-Target-Disease graph neural network (DTD-GNN) will be constructed to model the relationship among drugs, targets and diseases, and the proposed method will be validated through a link prediction approach.The research aims to provide new insights and methodologies for drug repurposing, contributing to advancements in drug development.
In summary, the main contributions of this work are provided as the following: • We constructed a dataset based on the relationship among drugs, targets, and diseases, which includes the association information among them.Additionally, we introduced event nodes to represent the ternary relationship among them.• Based on the associations among drugs, targets, diseases, and events, we constructed a heterogeneous graph structure to represent the intricate relationships between these entities.

Methods
In this section, we first introduce the concept of an event node, which represents the ternary relationship among drugs, diseases, and targets.And then explain the modeling process of the event-disease heterogeneous graph.Finally, a new heterogeneous graph is established to represent the relationship between events and diseases.The study further explorers the relationship between these entities through link prediction.

Construction of event nodes
To enhance our understanding of the relationships among drugs, targets, and diseases, we used an event node to model their interactions.We drew inspiration from Qu et al. 's [12] event graph and developed event nodes to capture and consolidate the relationships among these three entities.Given a set of drugs, targets, and diseases, a simple relationship diagram can be constructed to illustrate their interactions, as depicted in Fig. 1.The diagram illustrates the binary relationships between drugs and targets, and targets and diseases.However, this approach fails to capture the interdependent and inseparable nature of drugs, targets, and diseases.To establish a comprehensive and unified understanding of these relationships, the concept of events is introduced.
For all entities in the data (all drugs, targets and diseases are considered as one entity), if there is a relationship among drugs, targets and diseases, that is, if a particular drug X i and a particular target Y i are combined so that some corresponding diseases can be treated Z = {Z 1 , Z 2 , .., Z z } (where Z is the collection of disease nodes that can be treated), then the relationship between them can be defined as an event node Q =< X i , Y i , Z > .For example, in Fig. 2, if the combination of X 1 and

Heterogeneous graph construction of event node and disease node
The graph neural network comprises two main types: homograph and heterogeneous graph.In this study, we distinguish between event nodes and disease nodes as distinct types of nodes, enabling us to construct a heterogeneous graph based on their relationships.
In the event node representation, denoted as Q =< X i , Y i , Z > , and the disease node Z i , we gain an edge between the event node Q and the disease node Z i if Z i ∈ Z .This edge represents a "link" relationship between the event and the disease.For a clear and intuitive presentation, we present a partial structure of the heterogeneous diagram illustrating the event-disease relationships, as shown in Fig. 2.

DTD-GNN model construction
Our proposed DTD-GNN model utilises a graph convolutional network to capture the feature relationships between events and diseases, while using a graph attention network to handle the ternary feature relationships.A gate unit is employed to extract relevant features and obtain a new result for feature processing.The model is comprised of the following components:

Data feature construction
This section will focus solely on the construction of feature initialization for events, as the association between event and disease has already been elaborated.
To ensure the randomness and diversity of the disease types studied, we employ a random construction method for disease feature construction, as the constructed relationship is an event-disease association.We construct the event feature according to our definition of an event and combine it with the randomly initialized event feature during model training.This approach enables the full utilization of prior knowledge during training, promoting robustness and mitigating overfitting when extracting event features.Our construction Fig. 2 Heterogeneous graph of partial relationship between event and disease.To better represent the inherent combination of drugs and targets within event nodes, we enhance the heterogeneous graph depicting the partial relationship between events and diseases.In addition to the event nodes, we include nodes for drugs and targets in the graph method adopts the One-Hot encoding technique, which is explained below: • First, we create a feature matrix A of size q × z for the identified event and disease types.All elements of this matrix are initially set to 0. • Next, for each row in the feature matrix A (corresponding to each event), we iterate over each column to determine the presence or absence of disease associations.Specifically, if Z j ∈ Q i ( i ∈ {0, 1, ..., q} , j ∈ {0, 1, ..., z} ), we set the element A ij to 1. Other- wise, if there is no association between the event and disease, A ij remains 0. • To generate the event feature matrix A of size q × z , each row of events is looped through until all events have been traversed.During each iteration, the corresponding row in the feature matrix A is updated based on the associations between the events and diseases.Once all events have been iterated through, the event feature matrix A is constructed, where each row represents an event and each column represents a disease type.The matrix A q×z reflects the event- disease associations captured.

Encoder design
We employ graph convolutional and graph attention networks to extract features from event-disease and ternary relationships.Gate units [53] optimise the features, and residual connections facilitate feature fusion.
• Mapping of feature dimensions To ensure that event nodes and disease nodes share the same dimensional features, their node features must be mapped to a common space.For the category of nodes h i , we apply the transformation matrix M(Proj), as shown in Fig. 3.The mapping formula is as follows: • Convolution and attention of node dimension The graph convolution method is used to process the convolution operation on the nodes, which involves sampling and aggregating nodes.Firstly, a specific number of nodes are randomly sampled from the neighbors of each node i, denoted as N(i).Then, the feature of each node i, h ′ i is aggregated with the features of its neighboring nodes h ′ N (i) .The aggregation function can be expressed as follows: (1)

Fig. 3 Node feature dimension mapping graph
Where AGGREGATE is the aggregation function, h ′ i and h u for u in h ′ N (i) represent the characteristic of node i and the feature set of neighbour node N(i) respectively.Finally, the new representation of node i, h ′′ i is obtained by concatenating the aggregated features Z h ′ i with the current representation h ′ i of node i.The concatenated features are then mapped through a fully connected layer and activation function.This mapping operation can be expressed as follows: Where W is the weight matrix of the fully connected layer, CONCAT represents the join or concatenation operation, and σ denotes the activation func- tion.The graph attention mechanism is introduced in the attention operation on the node to extract features using different attention weights.The attention mechanism is defined for each node type v to capture the semantic relationships between nodes.It is then used to calculate the attention weight between node i and its neighbor node j: Where W v is the weight matrix for node type v, a denotes the parameter vector for the attention mechanism, || and σ respectively represent the link opera- tion and the activation function.Then, for each node i, the attention weight is utilized to aggregate the neighboring nodes, resulting in a semantic-level representation.This process combines the features of the neighboring nodes based on their attention weights, resulting in a comprehensive representation.
Where N v i indicates the set of neighbor nodes of type v of node i.Finally, for each node i, its node-level representation and semantic-level representation can be merged to obtain the final node representation.This fusion process combines the local information from the node-level representation and the semantic information from the semantic-level representation, resulting in a comprehensive and enriched representation that captures both the specific characteristics of the node and its semantic relationships within the graph. (2) Where v indicates the type of node i, CONCAT is a concatenation or merging operation.
• Gate unit The gate unit is a specialised structure used to control information flow and filtering in convolution.These units regulate information and memory flow through activation functions and dot product operations.The introduction of gate units enables the model to capture long-term dependencies more effectively and achieve superior performance in processing feature information.The formula for calculating the gate unit can be expressed as follows: Where h i is the final output value, W i and b are a learnable weight matrix and a learnable bias vector, σ represents the sigmoid activation function.

Decoder design
When studying the relationship between events and diseases, link prediction is used to assess the potential association between the two.Link prediction is a more intricate task than node classification as it involves using node embeddings to predict edges in a graph.The process of link prediction is illustrated in Fig. 4.
• First, The node embeddings are generated by the encoder through the application of L convolutional layers to process the input graph.
embedded dimension.This generates a probability value representing the likelihood of the edge's existence for each edge.

Loss function and optimizer selection
The BCEWithLogitsLoss loss function is appropriate for binary classification problems, which is applicable to the link prediction task in our drug repurposing project.This task involves determining whether a link exists between a given event and a disease.The BCEWithLogitsLoss function effectively handles this binary classification scenario.
In drug repurposing link prediction, positive and negative samples are often imbalanced, with a significantly larger number of negative samples compared to positive samples.To balance the influence of positive and negative samples and enable better handling of unbalanced data, BCEWithLogitsLoss can be weighted for different sample categories.The Adam optimizer is suitable for the drug repurposing link prediction task because of its adaptive learning rate adjustment based on the gradient range and parameter variations.In this task, different event-disease relationships may exhibit diverse gradient properties, and Adam can automatically adapt the learning rate to optimize performance in various scenarios.Furthermore, the Adam optimizer exhibits rapid convergence in the early stages of training, often achieving superior results in fewer iterations.Moreover, the Adam optimizer performs well when handling large-scale data and parameters, effectively optimizing the model.
We have successfully developed our DTD-GNN model based on the information provided.The model's complete process is illustrated in Fig. 5.

Data preparation
We collect paired records from the publicly available BioSNAP dataset [54], which contains information on drug-target, drug-disease and target-disease relationships.We extract a total of 15,140 drug-target, 4,66,658 drug-disease and 15,509,620 target-disease pairs from the BioSNAP dataset.These pairs represent a one-to-one correspondence between drugs and targets, drugs and diseases, and targets and diseases.By merging these pairs based on their relationships, we create a ternary relationship known as the drug-disease-target relationship, which forms the event node.The detailed statistics of the paired datasets are shown in Table 1, and the node statistics and event-disease relationships are shown in Table 2.
A heterogeneous graph is constructed based on the correlations between events and diseases.The edge set of the dataset is divided into three subsets for training, validation, and testing purposes.The training set is allocated 60% of the edges, the validation set is allocated 10% , and the remaining 30% is allocated to the test set.The results are then experimentally validated using these subsets.

Experimental verification
The experiment consists of four main components.Firstly, the optimal parameter settings for the DTD-GNN model are determined through parameter adjustment and training, using the BioSNAP dataset.Secondly, an ablation experiment is conducted on various aspects of the DTD-GNN model to identify the optimal structure.Additionally, we validated our approach by treating the relationship between events and diseases as link prediction.We compared our model's link prediction performance with other models' node classification performance using the AUC indicator.Finally, we selected heterogeneous graph neural network models commonly used for link prediction to demonstrate the advantages of our model.The next section will explain these indicators.
• Accuracy is a metric that measures the percentage of correctly predicted results out of the total sample.
Where TP stands for true positive, indicating that the prediction is true and aligns with the actual positive class.FP refers to false positive, signifying that the prediction is true, but it contradicts the actual negative class.FN represents false negative, indicating that the prediction is false, despite the actual   class being positive.TN means true negative, denoting that the prediction is false, and it aligns with the actual negative class.• The accuracy rate, also known as the precision rate, represents the probability that a sample predicted as positive is indeed positive.
• The recall rate, and known as the sensitivity or true positive rate, represents the probability of a positive sample being correctly predicted as positive.
• The F1 value is a metric that considers both the precision rate and recall rate simultaneously, aiming to achieve a balanced performance.
• The AUC (Area Under the Curve) index is calculated by measuring the area under the Receiver Operating Characteristic (ROC) curve, which is commonly used for comparing the performance of different models.The AUC value serves as an indicator of model performance, with larger values indicating better performance.Typically, the AUC value ranges between 0.5 and 1.The ROC curve is a graphical tool used to evaluate the performance of binary classification models.It plots the true positive rate (sensitivity) on the vertical axis against the false positive rate (1 -specificity) on the horizontal axis.The True Positive Rate (Sensitivity) is the proportion of correctly predicted positive instances out of all actual positive instances.
The False Positive Rate (1 -Specificity) is the proportion of incorrectly predicted negative instances out of all actual negative instances.
• The AUPR metric is frequently used to assess the performance of binary classification models, especially for tasks that involve imbalanced datasets or a focus on positive samples.To evaluate the indicators mentioned above, we conduct experiments and obtain results.The details of our experimental environment are presented in Table 3.

DTD-GNN model parameters
We perform tuning and iterative training on our model using the experimental environment and the BioSNAP dataset.This process entails optimizing seven crucial parameters: learning rate, loss function, optimizer, batch size, dimension number of node embedding, dropout rate, and weight decay.After extensive experimentation and continuous adjustments, we arrive at the final set of parameters for the trained DTD-GNN model.These parameters are presented in Table 4 below.

Ablation experiment
We conduct an ablation experiment on the core components of our DTD-GNN model to evaluate their impact on overall performance.The experiment focused on several factors, including the number of convolution layers (ranging from 1 to 3 layers), the separation and merging of the graph convolution layer and graph attention layer, and different decoding methods (such as feature product fusion, unilinear feature fusion, and bilinear feature fusion).The results of the experiments are presented in Table 5, with values rounded to five decimal places.The best results are highlighted in bold.For each ablation experiment, the default setting indicates that the structure of the other model components remained unchanged while only one component's structure was modified.
Table 5 shows that the performance of the DTD-GNN model is significantly impacted by different component structures.The number of convolution layers has minimal effect on the model's performance, with most indicators showing insignificant changes.However, increasing the number of convolutional layers to three reveals interesting phenomena.Specifically, the addition of a convolutional layer results in a decrease in Precision and an increase in Recall.Precision represents the proportion of predicted positive samples that are actually positive, while Recall represents the proportion of correctly predicted true positive samples.This suggests that the DTD-GNN model focuses more on quantity rather than quality when predicting positive examples.Based on this analysis, we select two convolutional layers as the optimal component structure.Our goal is to predict as many correct samples as possible while maintaining a high level of accuracy.
The experimental comparison was made between two different graph network structures: graph convolution network and graph attention network.The results indicate that the performance of the six indicators has significantly decreased when compared to the DTD-GNN model.This is due to the limitations of using a convolution network alone to process drug-target-disease data.The convolution network can only capture local neighborhood information in graph data, potentially ignoring the relationship between distant nodes.This limitation can lead to the convolution network failing to make full use of global information in complex drugtarget-disease relationship networks, ultimately affecting prediction performance.Additionally, using graph attention network alone has limitations in processing drug-target-disease data.While it can adjust the weight between nodes as needed, its computational complexity is high, particularly when dealing with large-scale graph data.This can result in low training and reasoning efficiency, which restricts its practical application.Additionally, it is challenging to model long-distance relationships.While attention mechanisms can capture global information to some extent, they may not transmit enough information when dealing with nodes that are far apart.This is because the attention weight may be attenuated or diluted during the propagation process, leading to limited information exchange between distant nodes.
In our experiments, we evaluated two additional decoders: the singlinear feature fusion decoder and the bilinear feature fusion decoder.The singlinear feature fusion decoder showed significant performance degradation compared to the DTD-GNN model across six metrics, indicating its ineffectiveness for drug repurposing tasks.However, the bilinear feature fusion decoder exhibits slightly lower performance in certain metrics compared to the DTD-GNN model's product feature fusion decoder.Nevertheless, the overall difference is not substantial, indicating that the bilinear feature fusion decoder has potential for drug repurposing tasks and may offer advantages in specific scenarios.Based on the experimental findings, however, the product feature fusion decoder remains the preferred choice.It is demonstrated that this decoder outperforms others, effectively utilizes feature information, and is suitable for predicting drug repurposing tasks.
During the investigation of the gate unit, we conducted experiments to assess its significance by removing it from the model.However, the experimental results showed a consistent downward trend across six key metrics after the removal of the gate unit.The gating unit controls the flow of information through the learned weights, allowing for more accurate capture of the key information in the graph data.By introducing the gating unit, the model's perception of global information can be improved, enhancing its flexibility and scalability.The gating unit can adjust the information transmission intensity between nodes, optimizing the model's performance for the task.Removing the gating unit reduces the model's expressiveness and generalization capabilities, leading to a decline in the accuracy of predicted results.

Comparison experiment
• Comparison experiment between node classification and link prediction task In our study, we treat the event-disease relationship as a link prediction task.Graph neural networks are commonly used for node classification and link prediction in drug relocation tasks.To demonstrate the effectiveness of this approach, we conduct a comparative experiment using eight widely graph neural network models.We ensure a fair comparison by selecting these models: GCN [48], GraphSAGE [50], GIN [51], HIN-2Vec [55], HGT [56], Event2Vec [57], HGNN [58], and EGNN [12].The test results of these models are compared with the link predictions of our DTD-GNN model using the AUC metric.The experimental results, presented in Table 6, display the results rounded to three decimal places, with the best-performing model highlighted in bold.The data presented in Table 6 indicates that using graph neural network models for node classification tasks yields inferior results compared to link prediction tasks when studying the relationship between events and diseases.The aim of the node classification task is to categorise nodes within a graph.However, accurately distinguishing between categories can be difficult due to limited information on node connections and features, particularly for complex heterogeneous graphs.
Events are entities with a structured format that involve ternary relationships.Therefore, the relationship between event nodes and disease nodes can be intricate, and the expression of node features may be incomplete or contain noise.Link prediction lever-ages existing connection patterns to predict connections between unlinked nodes, allowing for the inference of potential relationships based on the graph's connectivity information.Graph neural network models are ideal for link prediction tasks as they produce superior results in studying the relationship between events and diseases.• Comparison experiment of models in link prediction tasks To assess the effectiveness of our DTD-GNN model in predicting links, we compared it with other commonly graph neural network models.For this analysis, we chose five models: GCN [48], GraphSAGE [50], GAT [26], GIN [51], and HGT [56].Additionally, we used publicly available code to train, validate, and test the model on the dataset.The experimental results are presented in Table 7 with five decimal places.The best-performing results are highlighted in bold.Additionally, the AUC and AUPR curves of the models are visualised in Fig. 6, providing insights into the models' performance across different thresholds.The experimental results presented in  tionship between events and diseases, and predict this relationship by constructing a heterogeneous graph structure involving drugs, targets, and diseases.The primary aim of this task is to ensure high precision and confidence in predicting event-disease relationships.To achieve this, emphasis should be placed on the quality of the predicted positive samples.Precision is a measure of the accuracy of our predicted positive samples, which reflects the overall quality of our predictions.Recall measures the proportion of correctly identified positive samples out of all actual positive samples, prioritising quantity over quality.However, in the specific context of studying event-disease relationships, it is crucial to ensure the precision of high-quality associations rather than identifying a larger number of low-quality associations.Therefore, although the DTD-GNN model may exhibit relatively lower values in terms of accuracy and recall, it excels in precision and provides high-confidence predictions when studying the relationship between events and diseases.Overall, it outperforms the GraphSAGE model in this task, particularly in terms of predicting the quality and confidence of positive samples.The DTD-GNN model's optimisation towards task-specific objectives enhances its value and usefulness in addressing the drug repositioning problem.In drug repositioning tasks, it is common to encounter class imbalance issues, where there is an unequal distribution of positive and negative samples.In such scenarios, relying solely on accuracy as an evaluation metric is insufficient to provide a comprehensive assessment of model performance.This limitation arises from the potential bias of the model towards predicting a larger number of samples as the majority class, which can lead to biased predictions, particularly for negative samples.Therefore, the accuracy metric may be compromised and may not accurately reflect the overall effectiveness of the model.The DTD-GNN model is proficient in acquiring unique representations of diseases and events.It efficiently employs association information to capture complex relationships, resulting in comprehensive feature representations and improved performance.The model gains a deeper understanding of the intrinsic characteristics of these entities by utilizing feature embedding techniques, enabling accurate capture

Visual presentation of the DTD-GNN model
We have conducted visualizations of the predictions made by the DTD-GNN model.Considering the large number of nodes and links between events and diseases, we have selected 100 link relationships for visualization purposes.This enables direct observation and examination of the connections between events and diseases, as depicted in Fig. 7.
The analysis of Fig. 7 shows a clear correspondence between the prediction effect diagram of our DTD-GNN model and the actual relationship diagram.The visualization provides an intuitive understanding of the mutual relationships between event nodes (depicted as red nodes) and their corresponding disease nodes (depicted as blue nodes).These observations provide additional evidence for the effectiveness of our DTD-GNN model in accurately capturing and representing the connections between events and diseases.The relationship diagram uses solid lines to represent the relationship information used for model training.Dotted lines illustrate the relationship information between nodes that require prediction during model testing.The green color in the prediction effect diagram signifies the successful predictions made by the DTD-GNN model regarding the test edges.Based on the diagrams, it can be concluded that the DTD-GNN model accurately predicts the edges of the test data with a high degree of certainty.

Case study
Based on the outstanding performance of our framework, we have selected event No.2637 (drug DB00543, namely Amoxapine, is a tricyclic antidepressant used in the treatment of neurotic or reactive depressive disorders and endogenous or psychotic depression, target P50406) for analysis to investigate its potential relationship with various diseases.The model was pre-trained on BioSNAP data to predict the existence of a relationship between current events and diseases.
The model's prediction for event No.2637 indicates a connection between the event and 100 out of 3111 diseases.This suggests a relationship between the event and these 100 diseases.The predicted disease results associated with this event closely align with the actual correlation, indicating that the combination of the DB00543 drug and P50406 target in this event has a positive impact on the treatment of these 100 diseases.Table 8 displays the 100 diseases associated with this event.
Furthermore, the predictions indicate that a combination of drug DB00543 and target P50406 can be used to treat two common diseases.

Conclusion
In the paper, we introduce the use of event nodes to establish a ternary relationship among drugs, targets, and diseases.The effectiveness of the proposed eventdisease heterogeneity map is evaluated on the BioSNAP dataset.Additionally, a new DTD-GNN model is introduced, which combines graph convolution network and graph attention network to accurately represent the complex relationship between drugs, targets, and diseases through feature representation learning on heterogeneous graphs.
The DTD-GNN model demonstrated impressive performance across various classification metrics.These results validate the effectiveness and superiority of our model in exploring the relationships between drugs, targets, and diseases.The study has significant reference value for understanding the mechanism of drug action on diseases, drug repositioning, disease treatment, drug development, and treatment strategies in related fields.Additionally, the research contributes to the advancement and implementation of optimising drug design, drug screening, and drug repositioning.This provides guidance for improving drug effectiveness and reducing side effects.

Fig. 4
Fig. 4 Link prediction flowchart, where Node Pair Multiplication represents the dot product of the compute node embeddings and Aggregate Embed Dim represents the value of aggregating the entire embeddings dimension TP + TN + FP + FN

Fig. 5
Fig. 5 DTD-GNN model diagram, where (b) and (c) correspond to the network structure in the diagram, and (d) corresponds to features between nodes in the yellow section

•Fig. 7
Fig. 7 The figure presents a diagram showcasing the relationship between event and disease nodes.Disease nodes are denoted by color while event nodes are represented by the color red.In (a) graph, it displays the initial link between events and diseases, with dotted lines indicating the entity relationships requiring prediction.The (b) graph illustrates the predictions made by our DTD-GNN model for the event-disease link.The solid green lines represent the discernment results of the DTD-GNN model regarding the predicted entity relationships

Table 2
The number of nodes and event-disease relationships

Table 3
Experimental environment configuration

Table 4
Parameters of the DTD-GNN model

Table 6
The experimental results comparing the node classification task and the link prediction task

Table 7
The comparative experimental results of the models in the link prediction task

Table 8
Diseases related to the event No.2637